Heart Disease Analysis

by Mochamad Derisman Nugraha from Public Health dataset ( https://www.kaggle.com/ronitf/heart-disease-uci or https://archive.ics.uci.edu/ml/datasets/Heart+Disease )

Heart disease, otherwise known as cardiovascular disease, covers a wide range of conditions that affect the heart and has been the leading cause of death worldwide over the past few decades. This relates to the many risk factors for heart disease and the need for time to obtain an accurate, reliable and reasonable approach to making an early diagnosis to achieve rapid disease management. Data mining is a commonly used technique for processing very large data in the healthcare domain. The researchers applied several data mining and machine learning techniques to analyze huge complex medical data, helping healthcare professionals to predict heart disease.

Context

• age: The person’s age in years

• sex: The person’s sex (1 = male, 0 = female)

• cp: chest pain type

— Value 0: asymptomatic

— Value 1: atypical angina

— Value 2: non-anginal pain

— Value 3: typical angina

• trestbps: The person’s resting blood pressure (mm Hg on admission to the hospital)

• chol: The person’s cholesterol measurement in mg/dl

• fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)

• restecg: resting electrocardiographic results

— Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria

— Value 1: normal

— Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

•thalach: The person’s maximum heart rate achieved

•exang: Exercise induced angina (1 = yes; 0 = no)

•oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot. See more here)

•slope: the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping

•ca: The number of major vessels (0–3)

•thal: A blood disorder called thalassemia Value 0: NULL (dropped from the dataset previously

— Value 1: fixed defect (no blood flow in some part of the heart)

— Value 2: normal blood flow

— Value 3: reversible defect (a blood flow is observed but it is not normal)

•target: Heart disease (1 = no, 0= yes)

Source: https://towardsdatascience.com/heart-disease-uci-diagnosis-prediction-b1943ee835a7

Import Module & Data

Describe Data

the dataframe has no null values

Data Stats

Describe Statistics

Mode

Variance

Standard Deviation

InterQuartileRange (IQR)

Probability

The Slovin's Formula for determining the sample size for a survey research,to use and the computation is based almost solely on the population size. The Slovin's Formula is given as follows: n = N/(1+Ne2), where n is the sample size, N is the population size and e is the margin of error to be decided by the researcher. However, its misuse is now also a popular subject of research here in my country and students are usually discourage to use the formula even though the reasons behind are not clear enough to them. Perhaps it will helpful if we could know who really is Slovin and what were the bases of his formula



EDA

(Index(['age', 'trestbps', 'cholesterol', 'thalach', 'oldpeak', 'ca', 'target'], dtype='object'), Index(['sex', 'chest_pain', 'fbs', 'rest_ecg', 'exang', 'slope', 'thal'], dtype='object'))

Numerical Data

• Resting blood pressure, Cholesterol, Max Heart Rate, ST Depression, Num Major Vessel had a Outliers

Categorical Data

Charateristic

Summary of Patient Characteristic

• Male has a dominant Patient had heart disease

• The majority of patients diagnosed with heart disease have non-anginal pain, normal thalassemmia, had ST Wave Abnormality, downsloping ST Slope, low blood sugar,had no exang

• Major Vessel 0 has the most vessel which causes heart disease

1. Gender

Summary the Sex (Gender) Category is

Male is Majority Patient had 68,3% from population and the most detected heart disease